NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Local Minima Structures in Gaussian Mixture Models

https://doi.org/10.1109/TIT.2024.3374716

Chen, Yudong; Song, Dogyoon; Xi, Xumei; Zhang, Yuqian (June 2024, IEEE Transactions on Information Theory)

Full Text Available
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture

Zhang, Huijie; Lu, Yifu; Alkhouri, Ismail; Ravishankar, Saiprasad; Song, Dogyoon; Qu, Qing (June 2024, Conference on Computer Vision and Pattern Recognition)

Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employing a model to convert random noise into new samples (e.g., images). However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories, and employing a large model with numerous parameters across multiple timesteps (i.e., noise levels). To tackle these challenges, we present a multi-stage framework inspired by our empirical findings. These observations indicate the advantages of employing distinct parameters tailored to each timestep while retaining universal parameters shared across all time steps. Our approach involves segmenting the time interval into multiple stages where we employ custom multi-decoder U-net architecture that blends time-dependent models with a universally shared encoder. Our framework enables the efficient distribution of computational resources and mitigates inter-stage interference, which substantially improves training efficiency. Extensive numerical experiments affirm the effectiveness of our framework, showcasing significant training and sampling efficiency enhancements on three state-of-the-art diffusion models, including large-scale latent diffusion models. Furthermore, our ablation studies illustrate the impact of two important components in our framework: (i) a novel timestep clustering algorithm for stage division, and (ii) an innovative multi-decoder U-net architecture, seamlessly integrating universal and customized hyperparameters.
more » « less
Full Text Available
Efficient Low-Dimensional Compression of Overparameterized Models

Kwon, Soo Min; Zhang, Zekai; Song, Dogyoon; Balzano, Laura; Qu, Qing (May 2024, International Conference on Artificial Intelligence and Statistics)

In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We observe that for many deep models, updates to the weight matrices occur within a low-dimensional invariant subspace. For deep linear models, we demonstrate that their principal components are fitted incrementally within a small subspace, and use these insights to propose a compression algorithm for deep linear networks that involve decreasing the width of their intermediate layers. We empirically evaluate the effectiveness of our compression technique on matrix recovery problems. Remarkably, by using an initialization that exploits the structure of the problem, we observe that our compressed network converges faster than the original network, consistently yielding smaller recovery errors. We substantiate this observation by developing a theory focused on deep matrix factorization. Finally, we empirically demonstrate how our compressed model has the potential to improve the utility of deep nonlinear models. Overall, our algorithm improves the training efficiency by more than 2x, without compromising generalization.
more » « less
Full Text Available
Efficient Low-Dimensional Compression of Overparameterized Models

Kwon, Soo Min; Zhang, Zekai; Song, Dogyoon; Balzano, Laura; Qu, Qing (May 2024, Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024,)

Full Text Available

Search for: All records